Path-Source Oriented Session Identification Based on Linked Referrers and Log Indexing
نویسندگان
چکیده
Web usage mining has been widely adopted in various fields such as optimizing site structure, user-behavior analysis, personalized web services and system performance tuning. Although much research has been done against web log mining algorithms and log preprocessing techniques, the study of efficient retrieval of the structured contents for web log mining is seldom reported. In this paper, we first show that people are much more interested in discovering user navigation based on various path-sources. Then, we present a novel session identification algorithm Referrer Link based on discovering linked referrers to serve source-oriented path mining. Next, an efficient web log indexing and path extracting technique is introduced to provide structured web log data for general purpose log mining. The experimental results has shown that the accuracy of the mining results conducted against the sessions discovered by the proposed Referrer Link algorithm is 10% higher in average compared with Time-out approach.
منابع مشابه
eXist: An Open Source Native XML Database
With the advent of native and XML enabled database systems, techniques for efficiently storing, indexing and querying large collections of XML documents have become an important research topic. This paper presents the storage, indexing and query processing architecture of eXist, an Open Source native XML database system. eXist is tightly integrated with existing tools and covers most of the nat...
متن کاملIndexing Moving Objects on Road Networks in P2P and Broadcasting Environments
voted to overcome this problem. The lengths of routing path are O(dn 1 d ) for CAN and O(log n) for Chord, which are in fact the cost of search, where there are n nodes. In this paper, we propose an alternative indexing scheme not only relying on P2P but also on broadcasting environments. The contributions of this paper include first the reduction of routing path to nearly O(1) for road-oriente...
متن کاملGeneral Dynamic Routing with Per-Packet Delay Guarantees of O(distance + 1 / session rate)
A central issue in the design of modern communication networks is that of providing performance guarantees. This issue is particularly important if the networks support real-time traffic such as voice and video. The most critical performance parameter to bound is the delay experienced by a packet as it travels from its source to its destination. We study dynamic routing in a connection-oriented...
متن کاملPre Processing of Web Logs – An Improved Approach For E-Commerce Websites
In this paper an improved approach for pre processing of web logs data has been proposed and evaluated so that it can be applied for web logs of e-commerce web sites. The resultant web log data after these pre processing steps can be used for further pattern discovery and analysis that helps to provide useful prediction to enhance e-commerce. Ideally, the input for the Web Usage Mining process ...
متن کاملScaling Laws of Networking-Theoretic Capacity for Wireless Networks
Multicast is a more general session than unicast and broadcast. The latter two can be regarded as two special cases of multicast indeed. In this paper, we focus on the networkingtheoretic multicast capacity bounds for both random extended networks (REN) and random dense networks (RDN) under Gaussian Channel model, when all wireless nodes are individually power-constrained. During the transmissi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011